Optimizing database definitions for a new database

ABSTRACT

Systems and methods for optimizing a definition for a database are provided. A method for optimizing a definition for a database, comprises receiving an input command to create a database object, receiving at least one extension corresponding to an estimated feature of the database, submitting the input command and the at least one extension to a knowledge base to determine an optimized command, and generating the optimized command.

TECHNICAL FIELD

The field generally relates to systems and methods for optimizing adefinition for a database and, in particular, systems and methods forautomating the optimization of database definitions for new databases.

BACKGROUND

Creating a database, such as a relational database, can be a complicatedprocess. To develop optimal definitions for a relational database, knownmethods require a significant amount of understanding of the databasebeing used, and of the data that will populate the database. Forexample, a Database Administrator (DBA) may spend years becoming anexpert in their field to eventually understand and follow all of thenuances that deliver the best performance for a database.

Beyond the role of a DBA, other database builders may not have anunderstanding of how to take advantage of their chosen product(s) andcreate optimal definitions for the database. Without proper definitions,such as definitions for the schema, a database may suffer fromperformance problems and unwanted overhead which consume, for example,central processing unit (CPU), Memory, and/or Disk Input/Output (I/O)resources. This could lead to service level agreements (SLAs) beingmissed due to unsatisfactory performance and outages.

Known solutions to fix relational database performance require a greatdeal of effort on the part of a DBA to analyze the existing schema, getadditional input from the database owner, and write new Data DefinitionLanguage (DDL) statements or commands. Depending on the scope of thedatabase, the time and money spent optimizing the database foracceptable performance may be significant.

Accordingly, there is a need for systems and methods which automate theoptimization of database definitions for databases.

SUMMARY

In general, exemplary embodiments of the invention include systems andmethods for optimizing a definition for a database and, in particular,systems and methods for automating the optimization of databasedefinitions for new databases.

According to an exemplary embodiment of the present invention, a systemfor optimizing a definition for a database, comprises an input modulecapable of receiving an input command to create a database object, andreceiving at least one extension corresponding to an estimated featureof the database, and an optimization module comprising a knowledge basecapable of receiving the input command and the at least one extensionfrom the input module, determining an optimized command, and generatingthe optimized command.

According to an exemplary embodiment of the present invention, a methodfor optimizing a definition for a database, comprises receiving an inputcommand to create a database object, receiving at least one extensioncorresponding to an estimated feature of the database, submitting theinput command and the at least one extension to a knowledge base todetermine an optimized command, and generating the optimized command.

According to an exemplary embodiment of the present invention, acomputer program product for optimizing a definition for a databasecomprises a computer readable storage medium having program instructionsembodied therewith, the program instructions executable by a processorto cause the processor to perform a method comprising receiving an inputcommand to create a database object, receiving at least one extensioncorresponding to an estimated feature of the database, submitting theinput command and the at least one extension to a knowledge base todetermine an optimized command, and generating the optimized command.

These and other exemplary embodiments of the invention will be describedor become apparent from the following detailed description of exemplaryembodiments, which is to be read in connection with the accompanyingdrawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Exemplary embodiments of the present invention will be described belowin more detail, with reference to the accompanying drawings, of which:

FIG. 1 is a flow diagram of a method for optimizing a definition for adatabase, according to an exemplary embodiment of the invention.

FIG. 2 is high-level diagram showing detail of a system for optimizing adefinition for a database, according to an exemplary embodiment of theinvention.

FIG. 3 illustrates a computer system in accordance with which one ormore components/steps of the techniques of the invention may beimplemented, according to an exemplary embodiment of the invention.

DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS

Exemplary embodiments of the invention will now be discussed in furtherdetail with regard to systems and methods for optimizing a definitionfor a database and, in particular, to systems and methods that automatethe optimization of database definitions for new databases. Thisinvention may, however, be embodied in many different forms and shouldnot be construed as limited to the embodiments set forth herein.

As new database software versions and/or fix packs are released, a DBAmust continually keep abreast of this information and make correctdecisions about database definitions. Depending on a DBA's level ofexpertise, such attempts could become a continuous loop of trial anderror until the DDL is perfected.

Embodiments of the present invention include systems and methods thatautomate the optimization of database definitions for new databases,thereby saving time and resources, and increasing a DBA's productivity.Embodiments of the present invention may also result in a broadening ofthe spectrum of information technology (IT) professionals who arecapable of creating optimized database definitions.

As used herein a “database definition” can refer to a definingcharacteristic or feature of a database object. A database definitioncan be present as a command or plurality of commands as part of DDL (seedefinition of DDL below).

As used herein, a “database object” can include, but is not limited to,schema, tables, tablespace, row assignments, column assignments,relationships between tables, keys, indexes, views, triggers, aliases,data types, sequences, database partition groups, buffer pools and/orprocedures and queries for reading and writing information.

As used herein a “table” can refer to an organized set of data elements(e.g., values) using a model of vertical columns (which are identifiedby their name) and horizontal rows, with a cell being a unit where a rowand column intersect.

As used herein, a “relational database” can refer to a database that hasa collection of tables of data items described and organized accordingto a relational model. Data in a single table can represent a relation.It is to be understood that embodiments of the present invention are notnecessarily limited to relational databases, and may apply to structuresmaking up a database, such as, for example, indexes (e.g., DB2®,Oracle®, MS SQL Server, Sybase®), and to other types of databases, suchas, for example, hierarchical (e.g., IMS®), network (based on a neuralnetwork model), and NoSQL (based on models, JSON™ and open source, e.g.,column family, Hbase, Mongo®, Cassandra). As used herein an “index” of adatabase can refer to a data structure that improves the speed of dataretrieval operations on a database table. Indexes can be used to quicklylocate data without having to search every row in a database table everytime a database table is accessed. An index can be a copy of one or morecolumns of data from a database table that can be efficiently searched.

As used herein “tablespace” can refer to a database storage locationwhere data underlying database objects can be kept. A tablespace can bereferred to by name when creating database segments (e.g., databaseobjects which occupy physical space, such as table data and indexes).

As used herein “Data Definition Language” or “DDL” can refer to syntaxfor defining data structures. DDL can be used for commands that definedifferent structures in a database to establish database definitions.For example, DDL statements or DDL commands can create, modify, andremove database objects. DDL statements may include, for example,CREATE, ALTER, and DROP.

As used herein, “extensions” or “extensions to the DDL” can refer toadditions to the DDL which correspond to parameters provided by adatabase creator that address table and column behavior. The parameterscan include, but are not limited to, estimates of how often a column isupdated, the size of the data type, the average length of the column inpractice, and/or the domain of values and value ranges. Extensions canspecify, for example a type (e.g., update frequency of a column), avalue (e.g., 8 updates/hour), and any conditions (e.g., updating onlyoccurs after specific event).

Embodiments of the present invention can be used to create databasedefinitions, such as, for example, tablespace, table, and/or indexdefinitions using, for example, optimal configuration and columnordering to enable efficient database utilization and performance.

Typically, an optimal database definition created by a DBA is based on asignificant amount of experience and data knowledge, as well as anunderstanding of underlying relational models and storage engines of aproduct. Those that are not DBAs may have no knowledge of theseunderlying factors and, as a result, create databases that consume moreresources and are less likely to be as efficient as those a DBA woulddefine. In today's fast paced development cycle, DBA resources may notalways be readily available to developers. The embodiments of thepresent invention aim to reduce perfunctory DBA tasks, saving time andresources, and increasing productivity. The embodiments of the presentinvention further aim to reduce or eliminate the time required toconcentrate on database definition details.

Referring to FIG. 1, a flow diagram of a method for optimizing adefinition for a database, according to an embodiment of the presentinvention, is shown. The method 100 is an estimate approach that can beused when a database has no existing objects. The method includesreceiving an input command to create a database object (block 102),receiving at least one extension corresponding to an estimated featureof the database (block 104), submitting the input command and the atleast one extension to a knowledge base to determine an optimizedcommand (block 106), and generating the optimized command (block 108).

In accordance with an embodiment of the present invention, the input andoptimized commands each comprise one or more DDL statements. A pluralityof input commands and extensions can be generated by a user and includedin DDL. The estimated features of the database which correspond to theextensions include, but are not limited to, values specifying usageestimates, growth estimates and size estimates. More specifically, anestimated feature of the database can include, for example, an updatefrequency of a column, an update frequency of a row, a number of reads,a number of creates, a size of a data type, an average size of a column,an average size of a row, the domain of values, value ranges, and/or agrowth rate of a table. Average sizes can include, for example, anaverage row size of a variable character field (varchar), blog, and/orextensible markup language (XML) columns.

If the input command is not already optimal, the optimized command isgenerated by altering the input command. The method 100 can furthercomprise automatically executing the optimized command against thedatabase (block 110), and/or outputting syntax of the optimized commandto a user (block 112). The optimized command can result in, for example,reordering table columns, adding an index and/or resizing databasetablespace.

In accordance with an embodiment of the present invention, the method100 utilizes extensions in the DDL, which are parameters provided by adatabase creator that are used when a database has no existing objects.In accordance with an embodiment of the present invention, extensionsare made to the DDL to accept these optional parameters that addresstable and column behavior to optimize the input commands. The existenceof these new parameters will signal a database engine to submit theextensions and input commands to a knowledge base. According to anembodiment, the knowledge base returns an optimized DDL and an alterdatabase command if necessary (e.g., when the input command or DDL isnot already optimal). As noted above with respect to block 110,optimized DDL files can be executed against the database. As notedabove, possible alterations to a DDL can result in reordering tablecolumns, adding indexes and/or resizing the database tablespace.

Referring to FIG. 2, which is a high-level diagram of a system 200 foroptimizing a definition for a database, according to an embodiment ofthe present invention, the system includes an input module 202 capableof receiving an input command to create a database object, and receivingat least one extension corresponding to an estimated feature of thedatabase, and an optimization module 204 connected to the input module202. The optimization module 204 includes a knowledge base 206 capableof receiving the input command and the at least one extension from theinput module 202, determining an optimized command, and generating theoptimized command.

The system can further include an execution module 208 capable ofautomatically executing the optimized command against the database, andan output module 210 capable of outputting syntax of the optimizedcommand to a user.

In accordance with an embodiment of the present invention, the knowledgebase 206 includes an internal repository of decision trees andquantifiable information about database objects through which the inputcommands and extensions are processed to permit the knowledge base 206to generate, for example, optimized create and alter commands. Theknowledge base 206 can be extended for each database software release toembody the necessary extensions to reflect improvements in the databaseengine that may be derived via the methods of the embodiments of thepresent invention.

In accordance with an embodiment of the present invention, the knowledgebase 206 may be figuratively viewed as a “black box.” The knowledge base206 can be designed by the same team that develops software for a givendatabase that is being optimized. The knowledge base 206 can, forexample, work for DB2® and other types of database products. Accordingto embodiments of the present invention, the knowledge base 206 maycontain the actual size of each type of column, logic that would acceptthe additional input parameters from the DDL and process the parameters.In accordance with an embodiment, decision trees in a knowledge base canprocess each parameter for each object and output new DDL statements.The output of the knowledge base can result in actions such as, forexample, reordering columns in a table, putting tables in differenttable spaces and adding indexes to a database. The outputs can bedifferent for each type of database product. Different inputs could beadded that would result in different types of resulting actions.

In accordance with an embodiment of the present invention, a databasecreator creates object definition syntax, and adds options (extensions),which are processed by the knowledge base 206 to determine and generatethe optimized object definitions. In a specific example, a databasecreator may submit via the input module 202, a “Create Table” objectincluding extensions to the DDL for a column object. The extensions fora column object can include, for example, estimated parameters anddescriptions, such as number and time units (e.g., 8 times/hour) inconnection with column update frequency, number and size units (e.g., 4million cells) in connection with column length, and an interval inconnection with a column value range. Additional column behaviorattribute extensions can also be submitted. The “Create Table” objectand extensions to the DDL for a column object are processed by theknowledge base 206 to determine and generate the optimized column objectdefinitions, which can then be presented to the database creator in theform new syntax, or automatically executed against a database.

In accordance with embodiments of the present invention, descriptions ofhow the DDL could be changed are set forth below. Two applicable typesof objects that could be extended with additional properties to the DDLare addressed below as they relate to DB2®. However, it is to beunderstood that embodiments of the present invention are not necessarilylimited thereto, and other types of objects could be altered withadditional properties. The two object types are columns and tables, andproperties added to the DDL language are optional.

    COLUMN PROPERTIES SYNTAX create table TABLE_NAME (COLUMN_NAMEINTEGER { column_average_update (#, TIME_UNIT), column_sum_update (#,TIME_UNIT), column_min_update (#, TIME_UNIT), column_max_update (#,TIME_UNIT), column_average_read (#,TIME_UNIT), column_sum_read (#,TIME_UNIT), column_min_read (#, TIME_UNIT), column_max_read (#,TIME_UNIT), column_average_insert (#, TIME_UNIT), column_sum_insert (#,TIME_UNIT), column_min_insert (#, TIME_UNIT), column_max_insert (#,TIME_UNIT), column_average_size (#, SIZE_UNIT), column_value_range(NUMBER_RANGE OR VALUE_RANGE), column_predicate_usage (%, #),column_data_patterns (SORT_ORDER) }, ... )     TABLE PROPERTY SYNTAXcreate table TABLE_NAME (COLUMN_NAME INTEGER, ... ) TABLE_GROWTH_PATTERN(#,TIME_UNIT), TABLE_DELETE_PATTERN (#,TIME_UNIT)

In accordance with embodiments of the present invention, the followingparameter descriptions may apply. TIME_UNIT can refer to, for example,daily, weekly, monthly, yearly, hourly, per minute, per second, etc.SIZE_UNIT can refer to, for example, Byte (B), kilo-byte (KB), mega-byte(MB), giga-byte (GB), tera-byte (TB), etc. SORT_ORDER can refer to, forexample, ascending (asc), descending (desc), random. NUMBER_RANGE canrefer to, for example, # . . . #, e.g.: 1 . . . 1000. VALUE_RANGE canrefer to, for example, varchar, varchar, varchar, e.g.: Cat, Dog,Gerbil.

    COLUMN PROPERTY EXAMPLE create table TABLE_NAME (COLUMN_NAME INTEGER{ column_average_update ( 100, daily), column_sum_update (200, daily),column_min_update (10, daily), column_max_update (500, daily),column_average_read (1000 ,daily), column_sum_read (800, daily),column_min_read (400, daily), column_max_read (2000, daily),column_average_insert (100, daily), column_sum_insert (200, daily),column_min_insert (50, daily), column_max_insert (500, daily),column_average_size (5, KB), column_value_range (1...1000),column_predicate_usage (100%,10), column_data_patterns (asc) }, ... )    TABLE PROPERTY EXAMPLE create table TABLE_NAME (COLUMN_NAME_INTEGER,... ) TABLE_GROWTH_PATTERN (500, monthly), TABLE_DELETE_PATTERN(100,weekly)

As used herein, “column_average_update (#, TIME_UNIT)” refers to theexpected average number of updates to be made within the provided timeunit.

As used herein, “column_sum_update (#, TIME_UNIT)” refers to theexpected summation of updates to be made within the provided time unit.

As used herein, “column_min_update (#, TIME_UNIT)” refers to theexpected minimum number of updates to be made within the provided timeunit.

As used herein, “column_max_update (#, TIME_UNIT)” refers to theexpected maximum number of updates to be made within the provided timeunit.

As used herein, “column_average_read (#, TIME_UNIT)” refers to theexpected average number of reads to be made within the provided timeunit.

As used herein, “column_sum_read (#, TIME_UNIT)” refers to the expectedsummation of reads to be made within the provided time unit.

As used herein, “column_min_read (#, TIME_UNIT)” refers to the expectedminimum number of reads to be made within the provided time unit.

As used herein, “column_max_read (#, TIME_UNIT)” refers to the expectedmaximum number of reads to be made within the provided time unit.

As used herein, “column_average_insert (#, TIME_UNIT)” refers to theexpected average number of inserts to be made within the provided timeunit.

As used herein, “column_sum_insert (#, TIME_UNIT)” refers to theexpected summation of inserts to be made within the provided time unit.

As used herein, “column_min_insert (#, TIME_UNIT)” refers to theexpected minimum number of inserts to be made within the provided timeunit.

As used herein, “column_max_insert (#, TIME_UNIT)” refers to theexpected maximum number of inserts to be made within the provided timeunit.

As used herein, “column_average_size (#, SIZE_UNIT)” refers to theaverage actual column size and the related size unit.

As used herein, “column_value_range (NUMBER_RANGE OR VALUE_RANGE)”refers to the range or list of values expected for a given column.

As used herein, with respect to “column_predicate_usage (FREQUENCY,RANKING)”, the FREQUENCY refers to a percentage value that indicates thefrequency with which this column will be used in a “where” clause forstatements. For example, 100% means the column will be used in the“where” clause for every applicable statement that is executed on therelated table. The RANKING value refers to a number that indicates theimportance of the column in the statement to the application. Allowablevalues are, for example, 0 through 10, where 0 means the column is notbeing ranked. If the RANKING parameter is not used for a given column, 0can be the default assumption. 10 can mean that the column is, the mostimportant part of the overall statement of the columns that were ranked.1 can mean that the column is the least important of the other columnsthat were ranked.

As used herein, with respect to “column_data_patterns (SORT_ORDER),” theSORT_ORDER value can be, for example, ascending (asc), descending (desc)or random. It indicates the order in which the values are sorted if thevalues will be inserted into one of these patterns. Can be used for anytype of field, but is best applied to a column of type date ortimestamp.

As used herein, “TABLE_GROWTH_PATTERN (#, TIME_UNIT)” can refer to thenumber of rows that will be added to a table during the provided timeunit.

As used herein, “TABLE_DELETE_PATTERN (#, TIME_UNIT)” can refer to thenumber of rows that will be deleted from a given table during theprovided time unit.

The present invention may be a system, a method, and/or a computerprogram product. The computer program product may include a computerreadable storage medium (or media) having computer readable programinstructions thereon for causing a processor to carry out aspects of thepresent invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, or either source code or object code written in anycombination of one or more programming languages, including an objectoriented programming language such as Smalltalk, C++ or the like, andconventional procedural programming languages, such as the “C”programming language or similar programming languages. The computerreadable program instructions may execute entirely on the user'scomputer, partly on the user's computer, as a stand-alone softwarepackage, partly on the user's computer and partly on a remote computeror entirely on the remote computer or server. In the latter scenario,the remote computer may be connected to the user's computer through anytype of network, including a local area network (LAN) or a wide areanetwork (WAN), or the connection may be made to an external computer(for example, through the Internet using an Internet Service Provider).In some embodiments, electronic circuitry including, for example,programmable logic circuitry, field-programmable gate arrays (FPGA), orprogrammable logic arrays (PLA) may execute the computer readableprogram instructions by utilizing state information of the computerreadable program instructions to personalize the electronic circuitry,in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the block may occur out of theorder noted in the figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

One or more embodiments can make use of software running on ageneral-purpose computer or workstation. With reference to FIG. 3, in acomputing node 310 there is a computer system/server 312, which isoperational with numerous other general purpose or special purposecomputing system environments or configurations. Examples of well-knowncomputing systems, environments, and/or configurations that may besuitable for use with computer system/server 312 include, but are notlimited to, personal computer systems, server computer systems, thinclients, thick clients, handheld or laptop devices, multiprocessorsystems, microprocessor-based systems, set top boxes, programmableconsumer electronics, network PCs, minicomputer systems, mainframecomputer systems, and distributed cloud computing environments thatinclude any of the above systems or devices, and the like.

Computer system/server 312 may be described in the general context ofcomputer system executable instructions, such as program modules, beingexecuted by a computer system. Generally, program modules may includeroutines, programs, objects, components, logic, data structures, and soon that perform particular tasks or implement particular abstract datatypes. Computer system/server 312 may be practiced in distributed cloudcomputing environments where tasks are performed by remote processingdevices that are linked through a communications network. In adistributed cloud computing environment, program modules may be locatedin both local and remote computer system storage media including memorystorage devices.

As shown in FIG. 3, computer system/server 312 in computing node 310 isshown in the form of a general-purpose computing device. The componentsof computer system/server 312 may include, but are not limited to, oneor more processors or processing units 316, a system memory 328, and abus 318 that couples various system components including system memory328 to processor 316.

The bus 318 represents one or more of any of several types of busstructures, including a memory bus or memory controller, a peripheralbus, an accelerated graphics port, and a processor or local bus usingany of a variety of bus architectures. By way of example, and notlimitation, such architectures include Industry Standard Architecture(ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA)bus, Video Electronics Standards Association (VESA) local bus, andPeripheral Component Interconnects (PCI) bus.

The computer system/server 312 typically includes a variety of computersystem readable media. Such media may be any available media that isaccessible by computer system/server 312, and it includes both volatileand non-volatile media, removable and non-removable media.

The system memory 328 can include computer system readable media in theform of volatile memory, such as random access memory (RAM) 330 and/orcache memory 332. The computer system/server 312 may further includeother removable/non-removable, volatile/nonvolatile computer systemstorage media. By way of example only, storage system 334 can beprovided for reading from and writing to a non-removable, non-volatilemagnetic media (not shown and typically called a “hard drive”). Althoughnot shown, a magnetic disk drive for reading from and writing to aremovable, non-volatile magnetic disk (e.g., a “floppy disk”), and anoptical disk drive for reading from or writing to a removable,non-volatile optical disk such as a CD-ROM, DVD-ROM or other opticalmedia can be provided. In such instances, each can be connected to thebus 318 by one or more data media interfaces. As depicted and describedherein, the memory 328 may include at least one program product having aset (e.g., at least one) of program modules that are configured to carryout the functions of embodiments of the invention. A program/utility340, having a set (at least one) of program modules 342, may be storedin memory 328 by way of example, and not limitation, as well as anoperating system, one or more application programs, other programmodules, and program data. Each of the operating system, one or moreapplication programs, other program modules; and program data or somecombination thereof, may include an implementation of a networkingenvironment. Program modules 342 generally carry out the functionsand/or methodologies of embodiments of the invention as describedherein.

Computer system/server 312 may also communicate with one or moreexternal devices 314 such as a keyboard, a pointing device, a display324, etc., one or more devices that enable a user to interact withcomputer system/server 312, and/or any devices (e.g., network card,modem, etc.) that enable computer system/server 312 to communicate withone or more other computing devices. Such communication can occur viaInput/Output (I/O) interfaces 322. Still yet, computer system/server 312can communicate with one or more networks such as a local area network(LAN), a general wide area network (WAN), and/or a public network (e.g.,the Internet) via network adapter 320. As depicted, network adapter 320communicates with the other components of computer system/server 312 viabus 318. It should be understood that although not shown, other hardwareand/or software components could be used in conjunction with computersystem/server 312. Examples, include, but are not limited to: microcode,device drivers, redundant processing units, external disk drive arrays,RAID systems, tape drives, and data archival storage systems, etc.

Although illustrative embodiments of the present invention have beendescribed herein with reference to the accompanying drawings, it is tobe understood that the invention is not limited to those preciseembodiments, and that various other changes and modifications may bemade by one skilled in the art without departing from the scope orspirit of the invention.

We claim:
 1. A system for optimizing a definition for a database,comprising: an input module capable of receiving an input command tocreate a database object, and receiving at least one extensioncorresponding to an estimated feature of the database; and anoptimization module comprising a knowledge base; wherein the knowledgebase includes design information corresponding to how a databasesoftware product processes and stores information in the database;wherein the knowledge base is capable of: receiving the input commandand the at least one extension from the input module; automaticallydetermining that the input command is not optimal for use with thedatabase software product; automatically determining an optimizedcommand for use with the database software product; and automaticallygenerating the optimized command by altering the input command; whereinthe input and optimized command each comprise one or more datadefinition language (DDL) statements; and wherein the optimized commandalters the input command to result in one or more modifications to a DDLstatement to conform the DDL statement to the database software productassociated with the knowledge base.
 2. The system according to claim 1,wherein the at least one extension is included in data definitionlanguage (DDL).
 3. The system according to claim 1, wherein theestimated feature of the database includes at least one of a usageestimate, a growth estimate and a size estimate.
 4. The system accordingto claim 1, wherein the estimated feature of the database includes atleast one of an update frequency of a column, an update frequency of arow, a size of a data type, an average size of a column, an average sizeof a row and a growth rate of a table.
 5. The system according to claim1, further comprising an execution module capable of automaticallyexecuting the optimized command against the database.
 6. The systemaccording to claim 1, further comprising an output module capable ofoutputting syntax of the optimized command to a user.
 7. The systemaccording to claim 1, wherein the knowledge base comprises a repositoryof decision trees and information about database objects to generateoptimized create and alter commands.
 8. The system according to claim 1,wherein the optimized command results in at least one of reorderingtable columns, adding an index and resizing database tablespace.
 9. Amethod for optimizing a definition for a database, comprising: receivingan input command to create a database object; receiving at least oneextension corresponding to an estimated feature of the database;submitting the input command and the at least one extension to aknowledge base to determine an optimized command; wherein the knowledgebase includes design information corresponding to how a databasesoftware product processes and stores information in the database;automatically determining that the input command is not optimal for usewith the database software product; automatically determining theoptimized command for use with the database software product; andautomatically generating the optimized command by altering the inputcommand; wherein the input and optimized command each comprise one ormore data definition language (DDL) statements; wherein the optimizedcommand alters the input command to result in one or more modificationsto a DDL statement to conform the DDL statement to the database softwareproduct associated with the knowledge base; and wherein the steps of themethod are performed by a computer system comprising a memory and atleast one processor coupled to the memory.
 10. The method according toclaim 9, wherein the at least one extension is included in datadefinition language (DDL).
 11. The method according to claim 9, whereinthe estimated feature of the database includes at least one of a usageestimate, a growth estimate and a size estimate.
 12. The methodaccording to claim 9, wherein the estimated feature of the databaseincludes at least one of an update frequency of a column, an updatefrequency of a row, a size of a data type, an average size of a column,an average size of a row and a growth rate of a table.
 13. The methodaccording to claim 9, further comprising automatically executing theoptimized command against the database.
 14. The method according toclaim 9, further comprising outputting syntax of the optimized commandto a user.
 15. The method according to claim 9, wherein the knowledgebase comprises a repository of decision trees and information aboutdatabase objects to generate optimized create and alter commands. 16.The method according to claim 9, wherein the optimized command resultsin at least one of reordering table columns, adding an index andresizing database tablespace.
 17. A computer program product foroptimizing a definition for a database, the computer program productcomprising a computer readable storage medium having programinstructions embodied therewith, the program instructions executable bya processor to cause the processor to perform a method comprising:receiving an input command to create a database object; receiving atleast one extension corresponding to an estimated feature of thedatabase; submitting the input command and the at least one extension toa knowledge base to determine an optimized command; wherein theknowledge base includes design information corresponding to how adatabase software product processes and stores information in thedatabase; automatically determining that the input command is notoptimal for use with the database software product; automaticallydetermining the optimized command for use with the database softwareproduct; and automatically generating the optimized command by alteringthe input command; wherein the input and optimized command each compriseone or more data definition language (DDL) statements; and wherein theoptimized command alters the input command to result in one or moremodifications to a DDL statement to conform the DDL statement to thedatabase software product associated with the knowledge base.